Memory-based Classification of Proper Names in Norwegian

نویسنده

  • Anders Nøklestad
چکیده

This paper describes the classifier part of a named entity recogniser for Norwegian which uses memory-based learning to categorise proper names. Names are classified into one of the categories Person, Organisation, Location, Work, Event, or Other. We test the effect of using different features as input to the model, ranging from knowledge-poor features such as windows of inflected forms, to features that require high-level processing such as syntactic analysis. We run training sessions with four different k-values for the knearest neighbour classifier, and with four different feature weighting schemes. We also apply a document-centred approach/one sense per discourse strategy to the output of the memory-based learner. We find that the most important features are the use of gazetteers and the inclusion of lemmas that constitute multi-word proper names, and that document-centred post-processing gives a highly valuable contribution to the performance of the classifier. The best version of the classifier achieves an accuracy of 90.67% using leave-one-out testing and 83.18% using ten-fold cross-validation. The classifier outperforms a maximum entropy model using the same set of

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

سیستم شناسایی و طبقه بندی اسامی در متون فارسی

Name entity recognition (NER) is a system that can identify one or more kinds of names in a text and classify them into specified categories. These categories can be name of people, organizations, companies, places (country, city, street, etc.), time related to names (date and time), financial values, percentages, etc. Although during the past decade a lot of researches has been done on NER in ...

متن کامل

Ontology-driven Conceptual Document Classification

Document classification based on the lexical-semantic network, wordnet, is presented. Two types of document classification in Serbian have been experimented with – classification based on chosen concepts from Serbian WordNet (SWN) and proper names-based classification. Conceptual document classification criteria are constructed from hierarchies rooted in a set of chosen concepts (first case) or...

متن کامل

The Place-Name as an Intangible Place of Memory (A Holistic Approach in Reading the Place-Names through a Comparative-Analytical Study on the Character of Name and Place)

Understanding architectural heritage and their various aspects have always been a subject of focus for the international conservation communities. Within the recent decades, eventhough the place-names are part of the living history as well as cultural heritage, they have still constantly been facing quick precipitant changes. As such, in the Conservation literature, most studies have skipped ad...

متن کامل

A new nomenclature for fungi

Important changes brought about by the Melbourne International Code of Nomenclature for Algae,FungiandPlantsare briefly reviewed concerning a clarification of the spelling and typification of sanctioned fungal names, the recognition of electronic publication for the validity of nomenclatural novelties, permission to use English diagnoses or descriptions for their valid publication, and the requ...

متن کامل

Retrieval of names in face and object naming in an interference study.

Two experiments using the interference paradigm are reported. In the first experiment, the participants spoke aloud the names of celebrities and the names of objects when presented with pictures while hearing distractors. In the case of proper names, we replicated the data obtained by Izaute and Bonin (2001) using the interference paradigm with a proper name written naming task. In the case of ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004